OcrV1, Main, Exploration, bibRecord, 000B18

Document image retrieval through word shape coding.

Identifieur interne : 000B18 ( Main/Exploration ); précédent : 000B17; suivant : 000B19

Document image retrieval through word shape coding.

Auteurs : Shijian Lu [Singapour] ; Linlin Li ; Chew Lim Tan

Source :

IEEE transactions on pattern analysis and machine intelligence [ 1939-3539 ] ; 2008.

RBID : pubmed:18787240

English descriptors

KwdEn :
- Artificial Intelligence, Automatic Data Processing (methods), Database Management Systems, Databases, Factual, Documentation (methods), Image Enhancement (methods), Image Interpretation, Computer-Assisted (methods), Information Storage and Retrieval (methods), Language, Pattern Recognition, Automated (methods), Reading.
MESH :
- methods : Automatic Data Processing, Documentation, Image Enhancement, Image Interpretation, Computer-Assisted, Information Storage and Retrieval, Pattern Recognition, Automated.
- Artificial Intelligence, Database Management Systems, Databases, Factual, Language, Reading.

Abstract

This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.

DOI: 10.1109/TPAMI.2008.89
PubMed: 18787240

Affiliations:

Singapour

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Document image retrieval through word shape coding.</title>
<author><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Infocomm Research, Agency for Science,Technology and Research (A*STAR), 21 Heng Mui Keng Terrace, Singapore. slu@i2r.a-star.edu.sg</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Institute for Infocomm Research, Agency for Science,Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</wicri:regionArea>
<wicri:noRegion>21 Heng Mui Keng Terrace</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Li, Linlin" sort="Li, Linlin" uniqKey="Li L" first="Linlin" last="Li">Linlin Li</name>
</author>
<author><name sortKey="Tan, Chew Lim" sort="Tan, Chew Lim" uniqKey="Tan C" first="Chew Lim" last="Tan">Chew Lim Tan</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2008">2008</date>
<idno type="doi">10.1109/TPAMI.2008.89</idno>
<idno type="RBID">pubmed:18787240</idno>
<idno type="pmid">18787240</idno>
<idno type="wicri:Area/PubMed/Corpus">000048</idno>
<idno type="wicri:Area/PubMed/Curation">000048</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000048</idno>
<idno type="wicri:Area/Ncbi/Merge">000058</idno>
<idno type="wicri:Area/Ncbi/Curation">000058</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000058</idno>
<idno type="wicri:Area/Main/Merge">000B30</idno>
<idno type="wicri:Area/Main/Curation">000B18</idno>
<idno type="wicri:Area/Main/Exploration">000B18</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Document image retrieval through word shape coding.</title>
<author><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Infocomm Research, Agency for Science,Technology and Research (A*STAR), 21 Heng Mui Keng Terrace, Singapore. slu@i2r.a-star.edu.sg</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Institute for Infocomm Research, Agency for Science,Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</wicri:regionArea>
<wicri:noRegion>21 Heng Mui Keng Terrace</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Li, Linlin" sort="Li, Linlin" uniqKey="Li L" first="Linlin" last="Li">Linlin Li</name>
</author>
<author><name sortKey="Tan, Chew Lim" sort="Tan, Chew Lim" uniqKey="Tan C" first="Chew Lim" last="Tan">Chew Lim Tan</name>
</author>
</analytic>
<series><title level="j">IEEE transactions on pattern analysis and machine intelligence</title>
<idno type="eISSN">1939-3539</idno>
<imprint><date when="2008" type="published">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Artificial Intelligence</term>
<term>Automatic Data Processing (methods)</term>
<term>Database Management Systems</term>
<term>Databases, Factual</term>
<term>Documentation (methods)</term>
<term>Image Enhancement (methods)</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Language</term>
<term>Pattern Recognition, Automated (methods)</term>
<term>Reading</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Automatic Data Processing</term>
<term>Documentation</term>
<term>Image Enhancement</term>
<term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Artificial Intelligence</term>
<term>Database Management Systems</term>
<term>Databases, Factual</term>
<term>Language</term>
<term>Reading</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.</div>
</front>
</TEI>
<affiliations><list><country><li>Singapour</li>
</country>
</list>
<tree><noCountry><name sortKey="Li, Linlin" sort="Li, Linlin" uniqKey="Li L" first="Linlin" last="Li">Linlin Li</name>
<name sortKey="Tan, Chew Lim" sort="Tan, Chew Lim" uniqKey="Tan C" first="Chew Lim" last="Tan">Chew Lim Tan</name>
</noCountry>
<country name="Singapour"><noRegion><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B18 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000B18 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:18787240
   |texte=   Document image retrieval through word shape coding.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:18787240" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

Serveur d'exploration sur l'OCR

Document image retrieval through word shape coding.

Document image retrieval through word shape coding.

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.